Conversation
There was a problem hiding this comment.
I just looked over the parts, where the behaviour of rdflib plays a role, because i dont fully understand what the code itself does. I hope my comments are a little helpful.
I have another more generell comment, as BNode is a subclass to str if you eg. check types with mypy you might get no error, when you by mistake compare a str to a BNode. But this would always fail. this might play a role, when you call hash_related_blank_node. But as i said, i have not a full grasp on the program.
| encoded = self._quote_encode(l_) | ||
|
|
||
| if l_.language: | ||
| if l_.datatype: |
There was a problem hiding this comment.
I would remove this test if l_.datatype: in rdflib, if language is given, the datatype should always be forced, so this test for datatype should always return the same. As far as i remember the correct datatype for lang tagged literals is rdf:langstring, so there might be in future a change to return rdfs:langstring instead of None.
There was a problem hiding this comment.
Agreed! But this is code copied (and cleaned) from rdflib's nquads serializer, so I'll mix it in a PR there.
This PR is a rather big one. It fully implements URDNA2015, URDNA2012 and RDFC10 and has complete test-suite coverage on these algorithms (with the exception of the one on poisonous datasets, but more about that later). It 'fixes' #220
Some general remarks before I dive into the details:
canon.py. This made the code much much cleaner, but ironically didn't fix what I introduced it for: nquad serialization. The relevant methods are copied over from rdflib until I can turn them into PRs for rdflib.rdflib.Datasetand back. This should ensure backwards-compatibility on some methods such asjsonld.normalization()andURDNA2015.main().Overview of the changes:
rdflibdependency in all relevant config and code files.rdflibmostly changedutil.pyfrom_legacy_dataset()andto_legacy_dataset()to easily go from anrdflib.Datasetto an RDFJS-likedictwherever needed.tests/test_util.pyfor these functions. We might want to move more general-purpose methods to this file.canon.py:URDNA2015._canonicalize(self, dataset: Dataset)while handling input and output remained inURDNA2015.main(). The latter now does the nquads parsing instead ofJsonLdProcessor.normalize(), so all parsing and serialization is handled by the same class.URDNA2015._canonicalize(self, dataset: Dataset)accepts anrdflib.Datasetand returns a tuple withstranddict.URDNA2015 .main(self, dataset: str | dict | Dataset, options)now accepts ardflib.Datasetobject in addition to an nquadsstror the original RDFJS-likedict. It returnsstr: the serialized nquads result, ordict: the result as RDFJS-like dataset or the blank node identifier map when the new parameteroutputMapisTrue.URDNA2015.hash_algorithmso it easily configurable (required for RDFC1.0)permutations()function now usesitertools.permutationsinstead of a custom implementation._nq_rowand_quoteLiteral, which should eventually move to a fix for rdflib's nquads serializer.tests/runtests.pyIt's possible that we move this code out before ever merging, but at least it can be easily tested. That said, when importing this functionality as an external library, we include rdflib anyway. It therefore makes sense to continue replacing the RDFJS-like data structures elsewhere in the code, essentially making it fully rdflib compliant and dependent.
Since this involves RDFLib: @nicholascar, anyone who can maybe have a look at this? Also @davidlehn @BigBlueHat @anatoly-scherbakov